Subset and manipulate data¶
Subset to specific samples¶
subset.samples(obj, var='index', slist='None', keep0=False):
Returns an object subsetted to the selected samples.
obj is the object generated with files.load
var is the column heading in the meta data used for subsetting, if var=’index’ the actual sample names will be used
slist is a list of samples or meta data labels to keep
if keep0 =True, OTUs/ASVs, which have zero reads associated with them after subsetting will be kept, otherwise they will be removed from the data.
Subset to specific OTUs/ASVs¶
subset.sequences(obj, svlist)
Returns an object subsetted to the selected OTUs/ASVs.
obj is the object generated with files.load
svlist is a list of OTU or ASV names that should be kept in the data set.
Subset to the most abundant OTUs/ASVs¶
subset.abundant_sequences(obj, number=25, method='sum')
Returns an object with only the most abundant OTUs/ASVs.
obj is the object generated with files.load
number specifies the number of ASVs to keep
method is the method used to rank OTUs/ASVs. If method =’sum’, the OTUs/ASVs are ranked based on the sum of the relative abundances in all samples. If method =’max’, they are ranked based on the max relative abundance in a sample.
Subset based on taxonomic classification¶
subset.text_patterns(obj, subsetLevels=[], subsetPatterns=[]):
Searches for specific text patterns among the taxonomic classifications. Returns an object subsetted to OTUs/ASVs matching those text patterns.
subsetLevels is a list taxonomic levels in which text patterns are searched for, e.g. [‘Family’, ‘Genus’]
subsetPatterns is a list of text patterns to search for, e.g. [‘Nitrosom’, ‘Brochadia’]
Merge samples¶
subset.merge_samples(obj, var='None', slist='None', keep0=False)
Returns an object where samples belonging the same category (as defined in the meta data) have been merged.
var is the column heading in metadata used to merge samples, the read counts for all samples with the same text in var column will be merged
slist is a list of names in meta data column which specify samples to keep, if slist=’None’ (default), all samples are kept
if keep0 =False, all OTUs/ASVs with 0 counts after merging will be discarded from the data.
Rarefy¶
subset.rarefy_table(tab, depth='min', seed='None', replacement=False)
subset.rarefy_object(obj, depth='min', seed='None', replacement=False):
Rarefies a count table to a specific number of reads per sample. The function subset.rarefy_table() operates only on the count table and returns only a rarefied table. The function subset.rarefy_object() operates on the whole object and returns a whole object. This means that samples and OTUs/ASVs which might have been dropped from the count table during rarefaction are also dropped from the ‘ra’, ‘tax’, ‘seq’, and ‘meta’ dataframes of the object.
tab is the count table to be rarefied
object is the object containing the count table to be rarefied
if depth =’min’, the minimum number of reads in a sample is used as rarefaction depth, otherwise a number can be specified
seed sets a random state for reproducible results, use an integer.
if replacement =False, the function is similar to rarefaction without replacement, if replacement =True, it does rarefaction with replacement.
Consensus table¶
subset.consensus(objlist, keepObj='best', taxa='None', alreadyAligned=False, differentLengths=False, nameType='ASV', onlyReturnSeqs=False)
Takes a list of objects and returns a consensus object based on ASVs found in all. Information about the fraction of reads retained from the original objects is also provided.
objlist is a list of objects
keepObj makes it possible to specify which object in objlist that should be kept after filtering based on common ASVs, specify with integer (0 is the first object, 1 is the second, etc), ‘best’ means that the object which has the highest fraction of its reads mapped to the common ASVs is kept
taxa makes it possible to specify with an integer the object having taxa information that should be kept (0 is the first object, 1 is the second, etc), if ‘None’, the taxa information in the kept object is used
if alreadyAligned =True, the subset.align_sequences function has already been run on the objects to make sure the same sequences in different objects have the same names
if differentLengths =True, it assumes that the same ASV inferred with different bioinformatics pipelines could have different sequence lengths.
nameType is the label used for sequences (e.g. ASV or OTU)
if onlyReturnSeqs =True, only a dataframe with the shared ASVs is returned.
Example
import qdiv
cons_obj, info = qdiv.subset.consensus([obj1, obj2])
qdiv.stats.print_info(cons_obj)
print(info)
In the example above, cons_obj is the new consensus object constructed based on obj1 and obj2.
info contains information about the fraction of reads retained from obj1 and obj2, as well as the maximum relative abundance of reads lost in a sample in each of the original objects.
import qdiv
shared_seqs, info = qdiv.subset.consensus([obj1, obj2], onlyReturnSeqs=True)
In the example above, shared_seqs is a pandas dataframe with the shared sequences
info just contains a text string saying that the shared ASVs were returned.
Merge objects¶
subset.merge_objects(objlist, alreadyAligned=False, differentLengths=False, nameType='ASV')
Takes a list of objects and a merged objects including all OTUs/ASVs and samples.
objlist is a list of objects
if alreadyAligned =True, the subset.align_sequences function has already been run on the objects to make sure the same sequences in different objects have the same names
if differentLengths =True, it assumes that the same ASV inferred with different bioinformatics pipelines could have different sequence lengths.
nameType is the label used for sequences (e.g. ASV or OTU)
Example
import qdiv
merged_obj = qdiv.subset.merge_objects([obj1, obj2])
qdiv.stats.print_info(merged_obj)
In the example above, merged_obj is the new object constructed by combining obj1 and obj2.